-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update benchmark data on VGG19 #5148
Conversation
benchmark/IntelOptimizedPaddle.md
Outdated
Machine: | ||
|
||
- Server | ||
- Intel(R) Xeon(R) Gold 6148M CPU @ 2.40GHz, 2 Sockets, 20 Cores per socket |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 Sockets, 20 Cores per socket ,这样算是40 Cores。
我用cat /proc/cpuinfo看是80 processor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嗯,我这里写的实core的数目。80代表你的机器超线程是开的。
benchmark/IntelOptimizedPaddle.md
Outdated
- DELL XPS15-9560-R1745: i7-7700HQ 8G 256GSSD | ||
- i5 MacBook Pro (Retina, 13-inch, Early 2015) | ||
- Desktop | ||
- i7-6700k |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Laptop和Desktop这里的型号信息不全,可以加TODO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嗯,这个没问题,可以后续你们添加这一块测试数据的时候一起添加。
我这里的几个型号是issue #5008 里面你列的那几个型号。
benchmark/IntelOptimizedPaddle.md
Outdated
- Desktop | ||
- i7-6700k | ||
|
||
System: CentOS 7.3.1611 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CentOS 6.3.10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
哦,我用的是7.3的这一个。
benchmark/IntelOptimizedPaddle.md
Outdated
|--------------|-------| -----| --------| | ||
| OpenBLAS | 7.86 | 9.02 | 10.62 | | ||
| MKLML | 11.80 | 13.43 | 16.21 | | ||
| MKL-DNN | 29.07 | 30.40 | 31.06 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我测出来的数据,整体慢1.5-2倍。其中OpenBLAS是源码编译,MKLML和MKL-DNN都是用docker镜像来跑。
BatchSize | 64 | 128 | 256 |
---|---|---|---|
OpenBLAS | 4 | 4.92 | 未测 |
MKLML | 4.7 | 6.4 | 7.68 |
MKL-DNN | 20 | 20 | 21 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
按照刚才你说的,看你的系统是开超线程的。那么
这里的配置最好写export KMP_AFFINITY="granularity=fine,compact,1,0"
我的脚本里面是关闭超线程的时候测的。
并且最好可以在运行的时候,用perf top看MKL-DNN的engine是否都运行正确了。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改成export KMP_AFFINITY="granularity=fine,compact,1,0
后,测试结果依然一样。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嗯,请问下BIOS的版本是什么?另外内存条是不是都插满了,以及频率是多少?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
使用dmidecode命令,这是打印结果
dmidecode.log.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我觉得和docker无关。mklml和mkldnn都在docker中运行,取第一列数据,我的提升是(20-4.7)/4.7=3.25倍,你的提升是(29.07-11.8)/11.8=1.45倍。
mklml和mkldnn的数据是不是也可以本地编译一下?只要测一个数据,看看数据有没有提升即可。
我可以编译一下docker中的openblas版,来进行测试。
因为上次你说的本地编译时libc缺乏的问题,我觉得还是要解决下
benchmark最好以docker为环境,这样能避免环境不一样带来的不同。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mklml和mkldnn都在docker中运行,取第一列数据,我的提升是(20-4.7)/4.7=3.25倍,你的提升是(29.07-11.8)/11.8=1.45倍。
这样更加能说明一点问题了。我在docker外测得mklml与mkldnn的比率没有那么大,恰好说明了你在docker中mklml的值是偏低的了, 或者是有潜在的问题还没有被发现。
我可以编译一下docker中的openblas版,来进行测试。
嗯,这个我同意。把三者放在一个环境下比较好。
benchmark最好以docker为环境,这样能避免环境不一样带来的不同。
嗯,这个我也同意,如果可以的话,你可以把你的docker镜像分享给我一份吗,我用我的机器也跑下看看,先排除机器等基本配置问题。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
仔细查看了dmidecode的结果,发现机器的内存确实不是性能最优的配置。现在插了16根内存条,有8个内存公用了4个channel。
需要把CPU0_A1, CPU0_D1, CPU1_A1, CPU1_D1的内存条去掉。
如果板子上的槽分蓝色和黑色的话,即把所有黑色槽上的内存条去掉。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
非常感谢系统部的 @BlackZhengSQ 帮助我们调对了内存条。
目前MKLDNN下,batchsize=64, 数据为26.67。看上去内存对性能的影响很大。
但26.67和28.46还存在一定的差距。
BatchSize | 64 | 128 | 256 |
---|---|---|---|
MKLML | 10.95 | 12.81 | 15.21 |
MKL-DNN | 26.67 | 28.06 | 28.65 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
非常感谢系统部的 @BlackZhengSQ 帮助我们从CentOs4.3升级到CentOs6.3。
目前MKLDNN下,差距从原来的6%缩小到3%。
BatchSize | 64 | 128 |
---|---|---|
MKL-DNN | 27.69 | 28.8 |
我用docker 最新的镜像
速度为 64/2.24885 = 28.46 与我之前测的29.07基本能对上。 |
41b46b4
to
2dccdc3
Compare
看来CentOS的版本还是会有一些影响的,commit中的数据是我在7.2的版本裸机下跑的。 针对MKL-DNN的数据,我又在docker 1.12.6里面跑了下:
batchsize 128
batchsize 256
整理如下:
误差最大1.7%,说明在docker内和外基本没啥差别。 对比在CentOS 6.3上面的数据,误差范围在3%左右。 |
docker版本对性能的影响也不大。针对MKL-DNN的数据,batchsize=64的情况下,我在docker 1.6.0和1.13.1里面跑了下: docker 1.13.1:
docker 1.6.0:
前面conversation中的数据,都是在docker 1.6.0下测的。 |
对比了下 @luotao1 更新的数据。
CentOS 6.3上的数据
误差在:
从相对误差来看,就第一个数字误差大点,MKL-DNN/MKLML的都还好 |
related #5008